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ABSTRACT 


EXTRACTION  OF  SINGLE  CHANNEL  CURRENT  FROM  CORRELATED  NOISE 
VIA  A  HIDDEN  MARKOV  MODEL.  John  L.  Walsh  (Sponsored  by  Fred  J. 
Sig worth).  Department  of  Cellular  and  Molecular  Physiology,  Yale  University  School 
of  Medicine,  New  Haven,  CT. 

Single-channel  patch-clamp  recordings  typically  involve  both  the  desired  signal 
and  an  overlying  correlated  noise.  Previous  methods  of  analyzing  these  recordings 
have  relied  on  heavy  filtering  so  that  the  signal  could  be  unambiguously  distinguished 
from  the  noise.  This  filtering  limited  the  time  resolution  of  the  record  and  conse¬ 
quently  adversely  affected  the  prediction  of  channel  parameters.  This  study  was  under¬ 
taken  to  develop  a  means  of  analyzing  patch-clamp  recordings  which  would  improve 
our  ability  to  estimate  kinetic  rate  constants.  A  method  is  presented  whereby  the  record 
is  pre-filtered  to  remove  the  noise  correlation  and  subsequently  processed  by  a  modifi¬ 
cation  of  an  existing  signal  processing  method,  the  hidden  Markov  process,  which 
accounts  for  the  pre-processing.  The  developed  process  offers  improvements  over  ex¬ 
isting  methods  by  addressing  the  problem  of  limited  time-resolution  and  by  taking  into 
account  the  nature  of  the  noise.  This  method  compares  favorably  to  the  existing  hidden 
Markov  process  when  both  are  applied  to  simulated  signals  overlayed  by  recorded 
noise,  especially  in  cases  of  low  signal-to-noise  ratios. 
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INTRODUCTION 


Research  in  the  field  of  cellular  membrane  channels  is  relying  more  and  more 
heavily  of  the  analysis  and  interpretation  of  patch-clamp  data.  Channel  switching  is 
generally  modelled  as  a  time-homogeneous  Markov  process  (Colquhoun  and  Hawkes, 
1981)  whose  elements  consist  of  Markov  states,  transition  rates  between  states,  and 
initial  state  probabilities.  There  is  good  evidence  to  suggest  that  this  model  is 
appropriate  (French  and  Horn,  1983;  Horn  and  Vandenberg,  1984)  although  some 
authors  suggest  alternative  models  (Liebovitch  et  al.,  1987).  Modelling  consequently 
consists  of  finding  the  elements  of  a  Markov  process  which  best  match,  in  some  sense, 
the  recorded  data. 

Traditionally,  analysis  of  the  data  to  determine  the  model  parameters  has 
proceeded  along  one  of  two  main  routes:  (1)  the  data  are  low-pass  filtered,  open  and 
closed  intervals  are  determined  by  threshold  detection  (Colquhoun  and  Sigworth, 

1983),  and  these  intervals  are  compared  with  those  derived  by  matrix  methods,  or  (2) 
competing  models  are  generated  and  their  relative  fits  to  the  data  are  compared  by 
maximum-likelihood  methods  (Horn  and  Lange,  1983). 

Inherent  to  both  of  these  analyses  is  the  issue  of  limited  time  resolution,  in  that 
short  switching  intervals  are  lost  in  the  filtering.  Strong  filtering  is  necessary  in  these 
methods  in  order  that  channel  transitions  be  unambiguously  recognized  in  the  noise.  A 
number  of  authors  have  addressed  this  issue  by  using  the  assumption  of  a  clear  dead¬ 
time  (Roux  and  Sauve,  1985;  Blatz  and  Magleby,  1986;  Yeo  et  al.,  1988;  Crouzy  and 
Sigworth,  1990),  but  this  approach  still  suffers  from  an  inherent  loss  of  information. 

In  addition,  the  above  methods  do  not  address  the  effects  of  noise,  which  have  been 
shown  by  McManus  et  al.  (1987)  to  cause  errors  in  the  estimates  of  both  long  and  short 
intervals. 
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Although  the  maximum-likelihood  methods  are  in  general  similarly  limited  by 
their  assumption  of  noiseless  data  or  ideal  filtering  (Horn  and  Lange,  1983;  Ball  and 
Samson,  1989),  Magleby  and  Weiss  (1990)  have  presented  a  fairly  general  method 
which  takes  these  effects  into  account.  Their  approach  begins  by  simulating  data  from 
a  model,  filtering  these  data  as  would  the  recording  equipment,  and  then  adding  a 
stretch  of  recorded  noise.  The  resulting  record  is  processed  in  exactly  the  same  way  as 
the  real  data  so  that  a  likelihood  comparison  may  be  made.  Best  model  parameters  are 
chosen  by  an  iterative  search  process.  Unfortunately,  their  method  does  not  account 
for  subconductance  levels,  and  is  also  very  computationally  intensive. 

A  novel  approach  to  the  problem  of  parameter  estimation  has  been  taken  by 
Chung  et  al.  (1990)  by  applying  a  signal  processing  algorithm  which  has  recently 
become  popular  for  speech-recognition  schemes,  known  as  hidden  Markov  modelling 
(HMM;  Rabiner,  1989).  This  method  may  be  viewed  as  the  combination  of  an 
extension  of  the  maximum-likelihood  method  described  by  Horn  and  Lange  (1983)  to 
account  for  noise,  with  efficient  parameter  re-estimation  formulas,  which  replace  the 
search  of  the  parameter  space  to  maximize  the  likelihood.  Hidden  Markov  modelling 
has  the  advantage  of  being  an  efficient,  full-likelihood  method  which  accounts  for 
noise,  and  which  can  account  for  subconductance  levels,  but  has  the  disadvantages  of 
assuming  unfiltered  data  and  white  background  noise. 

We  present  an  extension  of  the  method  of  Chung  et  al.  to  handle  filtered  data 
and  colored  noise.  We  undo  the  effects  of  filtering  by  passing  the  recorded  data 
through  a  filter  whose  transfer  function  is  the  inverse  of  that  of  the  recording 
equipment  (Fig.  IB).  We  subsequently  analyze  a  stretch  of  data  which  clearly  lacks  a 
signal  component  to  characterize  the  resulting  "unfiltered"  noise.  Using  the  results  of 
this  analysis  we  develop  an  ^-element  pre-whitening  filter  through  which  we  pass  our 
"unfiltered"  data.  By  so  doing,  we  change  the  problem  from  that  of  analyzing  a  1st- 
order  Markov  process  with  colored  noise  to  that  of  analyzing  an  «th-order  Markov 
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process  with  white  noise  (Fig.  1C).  By  regarding  ^-tuples  of  state  transitions  as  "meta¬ 
states,"  we  reduce  the  problem  to  first-order  and  solve  the  "meta-state"  model  by  the 
traditional  HMM  method  (Fig.  ID).  We  begin  our  presentation  with  a  brief  review  of 
the  theory  of  HMM  followed  by  a  presentation  of  our  extended  theory.  Next,  we 
examine  the  improvement  offered  by  the  extended  method  in  determining  kinetic  rate 
constants  for  Markov  processes  imbedded  in  correlated  noise.  Our  discussion  continues 
with  one  example  of  the  power  of  this  technique  and  concludes  with  a  look  at  its 
advantages  and  disadvantages. 
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THEORY 

Traditional  Hidden  Markov  Modelling 

We  define  a  hidden  Markov  model  by  its  elements:  a  matrix  A  of  kinetic 
parameters  atj ,  a  vector  B  of  output  probability  functions  bj ,  and  a  vector  n  of  initial 
state  probabilities  ni  (Rabiner,  1989).  The  elements  aiyof  matrix  A  represent  the 
probability  that  a  channel  in  state  q.  will  move  to  state  for  the  next  time  sample;  the 
elements  b}  ( yt )  represent  the  probability  of  observing  the  output  y  at  time  t  if  the 
channel  is  in  state  q}  at  time  t;  and  the  elements  ni  represent  the  probability  of  the 
channel  starting  in  state  qt  (Fig.  2;  Chung  et  al.,  1990). 

From  the  set  of  all  possible  models,  we  wish  to  choose  the  model  X  which  most 
likely  produced  the  observed  data.  Maximum-likelihood  theory  dictates  that  this 
model  is  that  for  which  the  data  have  the  greatest  probability.  Although  these  two 
statements  may  seem  identical,  the  concepts  are  distinct  —  the  former  being  the 
probability  of  the  model  given  the  data,  the  latter  being  the  probability  of  the  data 
given  the  model  [see  Colquhoun  and  Sig worth  (1983)  for  a  fuller  discussion  of  the 
theory].  To  achieve  our  selection,  we  begin  by  selecting  a  starting  model  X,  by 
whatever  method  we  so  chose,  and  for  this  model  define  two  partial  observation 
probabilities:  ak  (j)  =  P{  Yk  ,sk  =  q}  | A} ,  the  probability  of  observing  the  first  k 
measurements  and  of  being  in  state  q}  at  time  k,  and  (3k(i)  =  p{yt  -  =  qt  ,X],  the 

probability  given  the  assumption  that  the  channel  is  in  state  qt  at  time  k  of  observing 
the  last  T-k  measurements  (Fig.  2).  These  probabilities  are  calculated  recursively  by 
a  procedure  called  the  Forward-Backward  Procedure  (Rabiner,  1989): 


<*i(j)  =  nJbJ(yl), 


&t+ 1 0) — 


.  «■= i 


bj(y,+ J, 


l<j<N 


1  <j<N 


(1  a) 
(lb) 


and 
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M0  =  1,  l<j<N  (2a) 

/3,(0  =  Ia,A(>».)^.  O').  isysw,  (2*) 

7=1 

where  /V  is  the  number  of  allowed  channel  states.  We  use  these  intermediate  values  to 
determine  two  additional  probabilities,  yt  (i)  =  P{s,.  =  q^Yj.  ,A},  the  probability  given 
the  observation  sequence  of  being  in  state  qt  at  time  t,  and  (ij)  =  P{jt  =  q. , 
st+ 1  =  qj\YT,X],  the  probability  given  the  observation  sequence  of  being  in  state  qt  at 
time  t  and  making  a  subsequent  transition  to  state  qj : 


7,(0  = 


(0ft,(0 

p{yt\x} 


and 


(3) 


£(U) 


<Xi(i)aiMyt+i)Pt+lU) 

p{yt  |A} 


(4) 


With  these  calculated  variables,  a  new  set  of  parameters  A ,  B,  and  7t  is  determined  by 
the  Baum- Welch  re-estimation  formulas  (Rabiner,  1989): 


II 

itT 

1  <  i  <  N 

(5a) 

t= 1  /  t=\ 

\<i<  N 

(5b) 

T  /  T 

bi(k)  =  ^Y,{ 0  /X/,(0, 

1  <i<N. 

(5c) 

i=i  /  ,=i 


y,=k 


Baum  and  Sell  (1968)  have  shown  that  the  probability  of  generating  the  observed  data 
is  as  great  or  greater  for  this  new  model  A,  than  for  the  initial  model.  Iteration 
consequently  results  in  successively  better  models,  in  the  maximum-likelihood  sense. 
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Meta-State  Hidden  Markov  Modelling 

Inherent  to  the  Forward-Backward  procedure  is  the  assumption  that  the  output 
probability  density  functions,  bt  (y, ),  are  dependent  only  upon  the  current  state.  The 
traditional  model  consequently  assumes  an  input  which  is  the  sum  of  signal  terms  and 
noise  terms,  yt  =  y(s t )  +  nt ,  whose  probability  distributions  depend  only  on  the  present 
state  of  the  channel.  To  address  the  case  in  which  the  noise  term  is  dependent  on 
previous  noise  terms,  we  begin  by  assuming  the  noise  to  result  from  an  autoregressive 
(AR)  process,  and  filter  the  signal  to  remove  the  noise  correlation  in  a  process  known 
as  pre-whitening  (Fig.  1C;  Kay  and  Marple,  1981).  By  passing  an  autoregressive 
signal  through  an  appropriately  chosen  filter  of  the  form  A(z)  =  1  +  Takz~k,l<k<p, 
we  arrive  at  an  output  power  density  spectrum  which  is  flat,  or  "white."  Kay  and 
Marple  (1981)  note  that  even  if  the  noise  does  not  arise  from  an  AR  process,  the 
resultant  power  spectrum  "will  still  be  flatter  than  the  input  and  'approximately' 
white."  The  filter  coefficients,  ak,  are  chosen  by  the  equation 


X(o) 

*„(-!)  • 

•  K(~p) 

"  l ' 

V 

JU-0 

K  (0) 

•  *«(-(/>-!)) 

ax 

— 

0 

*„(/>-!)  ■ 

•  RJ  0) 

_aP_ 

0 

where  R^m)  is  the  autocorrelation  at  lag  m  of  a  stretch  of  sampled  data  which  clearly 
has  only  a  noise  component,  nt,  and  o  is  the  standard  deviation  of  this  noise.  The 
Levinson-Durbin  algorithm  (Kay  and  Marple,  1981)  provides  an  efficient  way  of 
solving  this  equation. 

By  applying  the  resulting  filter  to  the  sampled  data,  yt  =  y(s, )  +  nt ,  we  obtain  a 
filtered  signal,  y'  =  y'(st )  +  n't ,  consisting  of  a  corrupted  signal,  y'(.s, )  =  y (j,  )  + 
'Laky{st_k ),  1  <k<  p,  which  clearly  depends  on  previous  states,  and  a  now 
uncorrelated  n' .  If  we  consider  the  state  sequence  (s’  )  to  be  a  "meta-state" 

(Fig.  3),  we  note  that  y'  is  dependent  solely  upon  the  current  meta-state  and  an 
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uncorrelated  noise.  Consequently,  we  may  apply  the  forward-backward  method  to  the 
data  and  meta-states  and  use  modified  Baum-Welch  equations  to  re-estimate  our  model 

A.  A 

(Fig.  2).  We  define  the  probabilities  a,/3,y  and  £  for  the  meta-states  as  oc,p,y and  ^ 
were  defined  for  the  states: 


a,(J) 

=  P{Y,,o,  =  MJ  |A} 

(la) 

P ,(/) 

II 

S- 

1 

Q 

II 

(lb) 

/,(/) 

II 

T 

n 

(7c) 

1(1, J) 

=  P{<r,  =  M,,om  =Mj\Yt,X} 

(Id) 

where  o t  -  M}  denotes  that  the  most  recent p+ 1  states  a,  =  (s,_p...st)  correspond 
one-to-one  with  those  of  meta-state  My  =  (<7yi.-.<7y(/,+1))).  The  Baum-Welch  equations 
become: 


Ki  =ZTi(/) 

M, 


T- 1 


T- 1 


M,  Mj  l=l  /  M,  l=l 


*,(*)= XXr.m  Hf, a) 

M,  '=1  /  M,  l= 1 

y,=k 


+ 

II 

(8a) 

*?/(/>+ 1)  Qi 

(8*) 

Qj(p+ i)  -  Qj 

Qi(p+ 1)  ~  Qi  • 

(8c) 

Since  these  modifications  share  the  assumptions  of  the  theorem  of  Baum  and 


Sell  (1968),  iteration  of  this  method  similarly  refines  the  model  estimation. 
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METHODS 

Calculations  were  programmed  in  Modula-2  and  implemented  on  Motorola 
68030  and  68040  based  computers. 

Simulation  of  the  Input  Data 

The  traditional  hidden  Markov  method  and  the  method  modified  to  account  for 
correlated  noise  were  compared  by  applying  both  to  a  signal  consisting  of  simulated 
current  data  superimposed  upon  noise  from  a  typical  patch-clamp  experiment. 

A  Markov  model  was  chosen  and  data  were  generated  in  a  manner  similar  to 
that  described  by  Blatz  and  Magleby  (1986).  The  starting  state  was  determined  by 
summing  the  elements  of  the  starting  probability  vector  n  until  the  sum  was  greater 
than  a  random  number  between  0  and  1 .  The  index  of  n  which  was  reached  defined  the 
state,  and  the  current  corresponding  to  this  state  defined  the  starting  current. 

Sequential  states  were  determined  in  order  by  application  of  the  above  process  using  the 
transition  matrix  A  and  newly  generated  random  numbers.  Current  levels 
corresponding  to  the  state  sequence  defined  the  channel  current  sequence.  These 
uncorrupted  data  correspond  to  S(z)  in  Fig.  1A. 

For  the  noise  data,  a  string  of  recorded  data  was  pre-processed  by  passing  it 
through  a  filter  whose  transfer  function  was  the  inverse  of  the  recording  equipment. 

The  characteristics  of  this  filter  were  determined  by  analyzing  the  response  of  the 
recording  equipment  to  an  injected  square-wave  current.  By  filtering  the  recorded  data, 
bandwidth  was  returned  to  the  signal  and  the  unfiltered  signal  requirement  of  the 
hidden  Markov  model  (Fig.  IB)  was  more  closely  approximated.  A  stretch  of  post- 
processed  data  which  had  no  apparent  channel  activity  was  selected  for  the  noise.  This 
noise  and  the  generated  signal  were  added  to  produce  a  typical  input  to  the  hidden 
Markov  algorithms. 
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For  this  study,  the  noise  data  were  kindly  supplied  by  Dr.  Steven  M.  Sine  of  the 
Department  of  Cellular  and  Molecular  Physiology,  Yale  University  School  of 
Medicine.  These  data  were  obtained  in  a  cell-attached  recording  from  mouse 
fibroblasts,  filtered  at  20  kHz  by  an  8-pole  Bessel  filter,  and  sampled  at  94  kHz  (Sine 
et  al.,  1990).  The  determination  of  the  inverse  filter  and  the  pre-processing  of  the 
noise  data  were  kindly  performed  by  Dr.  Fred  J.  Sigworth,  also  of  the  Department  of 
Cellular  and  Molecular  Physiology,  Yale  University  School  of  Medicine. 

Two-D  Likelihood  Contours 

For  the  data  generated  in  the  manner  described  above,  the  "forward"  parts  of 
the  forward-backward  algorithms  of  each  of  the  hidden  Markov  algorithms  were  used 
to  compute  log-likelihood  contour  maps.  Colquhoun  and  Sigworth  (1983)  note  that 
likelihood  intervals  provide  measures  of  the  errors  of  estimation  equivalent  to  those  of 
standard  errors.  Consequently,  assessing  the  efficacy  of  each  of  the  two  methods  by 
likelihood  contours  permitted  a  comparison  of  the  two  methods:  where  they  had  their 
peaks,  how  large  their  2-unit  likelihood  contours  were,  and  whether  their  2-unit 
likelihood  contours  enclosed  the  actual  transition  rates. 

Reducing  the  computational  effort 

Since  the  number  of  calculations  required  for  the  forward-backward  algorithm  is 
on  the  order  of  N2T,  where  N  is  the  number  of  meta-states,  we  were  behooved  to 
recognize  meta-states  that  would  never  be  visited  so  as  to  limit  the  size  of  N. 
Consequently,  only  the  n-tuples  of  states  which  were  permitted  by  the  transition 
probability  matrix  A  were  enumerated  as  meta-states  .  That  is,  [qt . . .  qj  ,qk . . .  q, )  was 
not  listed  as  a  meta-state  if  ajk  were  zero.  Secondly,  since  meta-state  ( qx  ,q2...qk) 
could  only  make  a  transition  to  meta-state  (q2.. .qk,qt),  and  only  if  au  were  non-zero 
(see  Fig.  3),  for  each  meta-state  a  list  of  allowable  transitions  was  kept  so  that 
improvident  calculations  of  zero-terms  and  addition  of  these  into  the  forward-backward 
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algorithm  could  be  prevented.  A  model  with  three  states,  for  example,  for  which 
triplets  of  states  serve  as  meta-states,  has  27  meta-states  and  consequently  requires  on 
the  order  of  129T  calculations.  If  only  one  of  the  nine  possible  state  transitions  is 
assumed  to  be  zero,  six  of  the  meta-states  could  be  eliminated,  leaving  on  the  order  of 
441 T  calculations.  Furthermore,  of  these  remaining  441  transition  possibilities,  394 
are  not  allowed  by  the  second  criterion  above.  Thus,  only  on  the  order  of  47 T 
calculations  are  required,  a  savings  of  94%  in  computational  requirements.  Finally,  we 
obviated  the  need  to  calculate  since  inherent  to  every  meta-state  is  a  last 

transition.  Consequently,  a xj  was  calculated  by: 


T 


for  M,  s.t.  qHptl)  =  gt 

and  q,p  =  q, 

for  Mj  s.t.  qHptn=q, 


XXr,(/) 


M,  l= 1 


(9) 


T 


XX^(y) 


Mj  f=l 


11 


RESULTS 

The  Noise,  its  "color",  and  the  Simulated  Signal 

The  traces  in  Fig.  4  display  the  first  1024  points  of  a  10,000  point  simulation 
generated  to  compare  the  two  hidden  Markov  methods.  The  top  trace  displays  noise 
recorded  during  a  period  of  no  channel  activity.  The  points  shown  have  already  been 
post-processed  and  thus  correspond  to  n(z)A^  (z)  in  Fig.  IB.  The  mean  has  been 
subtracted  from  the  10,000  points  and  the  data  normalized  to  have  a  standard  deviation 
of  0.5.  The  autocorrelation  function  of  this  noise  is  displayed  in  Fig.  5. 

The  function  in  Fig.  5  was  obtained  by  autocorrelating  1024  representative  post- 
processed  points  occurring  earlier  in  the  recording.  In  the  figure,  the  large  negative 
value  at  the  first  time  lag  indicates  that  the  noise  is  strongly  correlated.  The  correlation 
results  from  the  fact  that  a  roughly  white  voltage  noise  in  the  patch-clamp  input 
amplifier  is  differentiated  through  the  input  capacitance  of  the  recording  equipment  to 
appear  as  a  current  noise  with  a  spectral  density  that  increases  with  frequency 
(Sigworth,  1983).  The  dashed  line  in  the  figure  shows  the  result  of  passing  the  noise 
through  a  pre-whitening  filter  with  three  degrees-of-freedom.  The  three  coefficients 
are  determined  by  the  Levinson-Durbin  algorithm  and  the  filter  is  subsequently  scaled 
to  have  a  unit-step  response  which  arbitrarily  reaches  a  level  of  one.  The  resulting 
signal,  n(z)A~l  (z)B(z)  in  Fig.  1C,  is  clearly  "whiter"  than  the  initial  signal. 

The  second  trace  in  Fig.  4  is  the  simulated  current  data,  S(z )  in  Fig.  IB, 
reflecting  the  scheme: 

0.1  v 

C  <■  »  O. 

03 

The  third  trace  shows  the  sum  of  the  post-processed  noise  and  the  simulated  signal,  and 
the  fourth  trace  shows  the  result  of  passing  the  post-processed  noise  and  signal  through 
the  pre-whitening  filter,  i.e.  S(z)B(z)  +  n(z)A~1  (z)B(z)  in  Fig.  1C. 
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The  bottom  trace  in  Fig.  4  is  presented  for  completeness.  It  shows  the  signal 
sequence  predicted  by  the  Viterbi  algorithm  (a  technique  for  finding  the  state  sequence 
with  the  highest  probability,  given  the  model;  Rabiner,  1989). 

Comparison  of  the  two  methods  by  likelihood  contours 

The  log-likelihood  plots  resulting  from  processing  the  simulated  signal  plus 
the  noise  through  each  of  the  two  hidden  Markov  methods  are  presented  in  Fig.  6.  The 
2-unit  likelihood  contours  are  shown  by  dashed  lines.  In  the  top  graph,  the  traditional 
hidden  Markov  method,  the  maximum  value  of  the  contour  occurs  on  the  high  side  for 
both  switching  probabilities  and  the  2-unit  contour,  representing  the  error,  does  not 
contain  the  actual  state  transition  probabilities  0.1  and  0.3.  The  traditional  hidden 
Markov  method  appears  to  be  fooled  into  thinking  that  the  correlated  noise  is  in  fact 
high  frequency  channel  switching.  This  guise  is  correctly  seen  through  by  the  modified 
method  (bottom  graph)  which  places  the  maximum  likelihood  close  to  the  actual  values, 
and  whose  2-unit  contour  contains  the  correct  values. 

That  the  failure  of  the  traditional  method  is  in  fact  a  consequence  of  its 
interpretation  of  the  noise  is  most  clearly  demonstrated  in  the  subsequent  two  figures. 

In  this  experiment,  the  signal  has  remained  the  same  and  the  noise  has  been  doubled, 
thereby  halving  the  signal-to-noise  ratio.  As  seen  in  Fig.  8,  the  traditional  model 
fancies  higher  switching  rates  with  increasing  noise.  The  modified  method  once  again 
interprets  the  noise  correctly. 

Demonstration  on  a  three-state  model 

To  demonstrate  the  effectiveness  of  the  model  in  determining  rate  constants  for 
more  complicated  switching  models,  the  noise  was  retained  and  the  signal  regenerated 
to  reflect  the  scheme: 
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Figure  9  displays  the  noise,  signal,  the  signal-plus-noise  and  the  transition  rates 
estimated  by  the  modified  hidden  Markov  method.  The  estimates  of  the  transition  rates 
are  not  too  far  off,  and  are  especially  remarkable  in  light  of  both  the  signal  from  which 
they  were  derived  and  the  exceedingly  poor  initial  guess  of  the  parameters.  In  fact,  as 
noted  by  Colquhoun  and  Sigworth  (1983),  the  standard  deviation  of  sampled  lifetimes 
is  n~m  times  the  true  lifetime,  where  n  is  the  number  of  observed  samples.  Since  there 
were  on  the  order  of  450  openings  per  10,000  point  sample,  the  opening  and  closing 
switching  rates  could  be  off  on  the  order  of  5  %  by  virtue  of  sampling  alone.  The 
hidden  state,  since  it  is  visited  less  often,  would  be  expected  to  have  higher  errors. 
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DISCUSSION 

We  have  described  here  an  extension  of  a  maximum-likelihood  method  for 
determining  the  kinetic  rate  constants  from  single  channel  data.  Our  method  accounts 
for  the  effects  of  filtering  by  the  recording  apparatus,  and  for  the  correlated  nature  of 
the  noise.  We  make  the  assumption  that  the  gating  of  the  channel  is  described  by  a 
time-homogeneous  Markov  process. 

As  evidenced  by  the  likelihood  contours  in  Figs.  6  and  8,  the  extended  hidden 
Markov  method  appears  to  demonstrate  its  greatest  benefit  over  the  standard  hidden 
Markov  method,  and  consequently  over  the  traditional  approaches,  by  allowing  data 
with  much  smaller  signal-to-noise  ratios  to  be  accurately  analyzed. 

We  have  limited  our  presentation  to  models  with  only  two  conductance  states. 
Chung  et  al.  (1990)  have,  in  fact,  already  demonstrated  the  power  of  the  hidden 
Markov  method  in  determining  sub-conductance  levels  in  the  presence  of  white  noise. 
We  expect  the  extension  of  the  modified  method  to  handle  sub-conductances  to  be 
relatively  straight  forward.  We  also  have  not  addressed  the  issue  of  competing  models 
since  methods  of  comparing  their  likelihoods  have  been  well  described  (Ball  and 
Sansom,  1989). 

Advantages 

Our  method  offers  the  following  advantages: 

(1)  It  is  a  maximum-likelihood  method  with  a  conceptually  straight-forward 

theory. 

(2)  It  allows  one  to  work  at  far  lower  signal-to-noise  ratios. 

(3)  It  accounts  for  the  true  nature  of  the  noise. 

(4)  All  data  points  are  used  at  one  time,  thereby  preserving  the  information 
contained  in  the  sequential  gating  pattern. 

(5)  It  works  well  with  limited  amounts  of  data. 
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(6)  There  is  no  observer  bias,  save  in  the  selection  of  noise  for  noise 
characterization. 

(7)  The  method  is  relatively  fast.  The  method  applied  to  a  two  state  model 
with  10,000  data  points  will  typically  converge  to  an  estimation  on  the  order  of  one 
minute. 

Disadvantages 

There  are  two  major  disadvantages  to  the  modified  method: 

(1)  It  assumes  the  channel  gating  to  be  governed  by  a  Markov  process. 
However,  even  if  the  gating  is  not  Markovian,  it  may  be  sufficiently  well  described  by 
a  Markov  process  for  purposes  of  modelling  its  behavior. 

(2)  The  computational  demands  of  the  method  grow  as  the  square  of  the 
number  of  "meta-states"  and  linearly  with  the  number  of  data  points.  Large  models 
will  either  cause  the  computer  to  run  out  of  memory,  or  slow  the  calculations 
inordinately.  Consequently,  anything  which  can  be  done  to  simplify  a  complicated 
model  before  running  the  hidden  Markov  method  would  be  beneficial  in  terms  of 
minimizing  computation  time  and  memory  requirements. 

Conclusion 

The  modification  of  the  hidden  Markov  method  to  account  for  the  effects  of 
data  acquisition  filtering  and  for  correlated  noise  has  been  shown  to  provide  a 
substantial  improvement  over  the  existing  method  in  its  ability  to  estimate  of  kinetic 
rates  of  data  generated  by  a  time-homogeneous  Markov  process.  When  used  alone  for 
small  models,  or  in  conjunction  with  other  methods  for  large  models,  the  method 
contributes  greatly  to  our  ability  to  glean  information  from  noisy  data. 


16 


(Please  See  Reverse  for  Fig.  1  Legend) 


Figure  1.  Schematic  diagram  of  the  data  processing.  (1)  Noise  from  the  environment 
and  from  the  amplifying  equipment  invariably  corrupts  signals  acquired  by  the  patch- 
clamp  technique.  (2)  As  a  first  step  in  processing  the  recorded  data,  the  data  are 
presented  to  a  filter  designed  to  negate,  as  far  as  possible,  the  modification  of  the  signal 
caused  by  the  measuring/recording  system.  (3)  A  stretch  of  the  resulting  data  which 
clearly  has  only  a  noise  component  is  analyzed  to  determine  the  characteristics  of  the 
noise,  and  the  data  are  passed  through  a  filter  designed  to  uncorrelate  the  noise  based 
on  these  characteristics.  (4)  The  resulting  transformed  signal  and  white  noise, 

S(z)B(z)  +  n(z)A_1(z)B(z),  are  presented  to  the  modified  hidden  Markov  process  for 
iterative  re-estimations  of  the  model  parameters. 
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(Please  See  Reverse  for  Fig.  2  Legend) 


Figure  2.  Elements  and  re-estimation  formulas  of  the  models.  Hidden  Markov  models 
are  defined  by  their  initial  state  probabilities,  transition  probabilities  and  observation 
probability  density  functions.  Combined  in  appropriate  ways,  these  elements  may 
iteratively  determine  intermediate  probabilities,  at  least  one  of  which  provides  a  direct 
measure  of  the  probability  of  that  hidden  Markov  model  generating  the  data  (Za^T)). 
The  measures  for  different  combinations  of  model  parameters  define  the  likelihood 
contours  associated  with  the  data.  Re-estimation  of  parameters  may  be  approached  in 
many  ways.  Here  it  is  performed  using  the  Baum-Welch  formulas  which  derive  re¬ 
estimations  from  the  intermediate  probability  values. 

Meta-state  hidden  Markov  modelling  procedes  almost  identically  with  the 
classical  modelling  by  defining  the  equations  in  terms  of  sequences  of  states  ( in  the 
figure,  triplets  ). 
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(Please  See  Reverse  for  Fig.  3  Legend) 


Figure  3.  Structure  of  the  meta-states.  The  top  figure  shows  a  classical  Markov  model 
consisting  of  two  states  and  the  transition  probabilities  between  them.  The  lower  figure 
is  the  meta-state  model  of  this  classical  model  for  sequences  of  three  states.  Meta¬ 
states  consist  of  ^-tuples  of  classical  states  corresponding  to  the  sequence  of  state 
transitions  up  to  the  most  recent  state.  The  number  of  meta-states  is  therefore  the 
number  of  permutations  of  classical  states  k  taken  n  at  a  time,  or  kn  (in  this  case  23,  or 
8).  If  the  classical  transition  probabilities  were  to  prohibit  certain  sequences  of  states 
from  occurring,  the  number  of  meta-states  would,  however,  be  smaller.  Note  that  the 
connections  between  meta-states  is  limited  since  the  last  n-1  states  in  one  meta-state 
must  be  the  first  n-1  states  in  the  subsequent  meta-state.  For  example,  state  001  can 
only  go  to  state  010  or  state  Oil,  not  state  111.  Further  note  that  the  transition 
probabilities  between  meta-states  are  simply  the  classical  probabilities  for  transitions 
between  the  last  elements. 
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(Please  See  Reverse  for  Fig.  4  Legend) 


Figure  4.  The  meta-state  algorithm  applied  to  data  with  a  noise  standard  deviation  of 
0.5.  The  top  trace  displays  noise  which  was  generated  by  94  kHz  sampling  of  the 
background  in  a  typical  patch-clamp  setup,  followed  by  passage  through  the 
corresponding  inverse  filter.  The  noise  was  scaled  to  yield  a  standard  deviation  of  0.5. 
This  was  added  to  a  signal  (second  trace)  generated  by  a  two-state  Markov  model  with 
current  levels  of  0  in  state  0  and  1  in  state  1,  and  with  transition  probabilities  aoj  of  0.1 
and  a10of  0.3.  The  middle  trace  displays  the  sum.  The  sum  was  subsequently  filtered 
by  a  four-element  pre- whitening  filter  (next  to  last  trace),  and  a  most-likely  signal  (last 
trace)  was  calculated  via  the  Viterbi  algorithm. 
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(Please  See  Reverse  for  Fig.  5  Legend) 


Figure  5.  Autocorrelation  of  recorded  noise.  The  solid  curve  displays  lags  0  through 
25  of  a  normalized  autocorrelation  of  a  1024  point  set  of  representative  noise.  Since 
white  noise  would  be  represented  by  a  peak  at  lag  0  followed  by  a  flat  line  between 
lags  1  and  25,  the  presence  of  a  large  dip  at  lag  1  indicates  a  large  correlated 
component.  These  autocorrelation  values  are  used  by  the  Levinson-Durbin  algorithm 
to  calculate  the  coefficients  of  the  pre- Whitening  filter.  The  dashed  line  displays  the 
autocorrelation  values  of  the  noise  after  it  has  been  whitened  by  a  pre-whitening  filter 
with  four  coefficients  (  ag  =  0.490,  a {  =  0.299,  a2  =  0.159,  a3  =  0.052  ).  The  filter 
coefficients  were  normalized  so  that  the  area  under  the  filter  was  1 . 
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(Please  See  Reverse  for  Fig.  6  Legend) 


Figure  6.  Likelihood  contours  for  a  10,000  point  string  of  data  consisting  of  noise 
with  a  standard  deviation  of  0.5  and  a  signal  generated  by  a  two-state  Markov  model 
with  transition  probabilities  ag,  =0.1  and  a10  =  0.3.  The  dashed  contours  represent 
the  boundaries  of  the  contour  which  is  2  natural-log  units  below  the  peak.  Surface  A 
represents  the  likelihood  surface  generated  by  the  classical  hidden  Markov  technique, 
whereas  surface  B  represents  that  for  the  meta-state  extension.  We  interpret  the 
deviation  of  the  classical  likelihood  surface  as  indicating  that  the  classical  method 
construes  the  noise  to  be  a  rapidly  switching  signal. 
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(Please  See  Reverse  for  Fig.  7  Legend) 


Figure  7.  The  meta-state  algorithm  applied  to  data  with  a  noise  standard  deviation  of 
1.0.  The  data  and  noise  are  the  same  as  that  in  Fig.  4  with  the  exception  that  the  noise 
was  scaled  up  to  a  standard  deviation  of  1.0.  We  note  incidentally  that  the  ability  of 
the  Viterbi  algorithm  to  estimate  the  signal  sequence  has  become  a  bit  strained  at  this 
low  signal-to-noise  ratio. 
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(Please  See  Reverse  for  Fig.  8  Legend) 


Figure  8.  Likelihood  contours  for  a  10,000  point  string  of  data  consisting  of  noise 
with  a  standard  deviation  of  1.0  and  a  signal  generated  by  a  two-state  Markov  model 
with  transition  probabilities  ao,  =0.1  and  a10  =  0.3.  The  dashed  contours  represent 
the  boundaries  of  the  contour  which  is  2  natural-log  units  below  the  peak.  The  top 
surface  represents  the  likelihood  surface  generated  by  the  classical  hidden  Markov 
technique,  whereas  the  bottom  surface  represents  that  for  the  meta-state  extension.  The 
classical  method  clearly  has  difficulty  distinguishing  the  noise  from  a  fast  switching 
current,  whereas  the  modified  method  still  includes  the  true  rates  within  its  error 
bounds. 

The  arrows  on  the  lower  surface  map  out  the  successive  Baum-Welch  re¬ 
estimations  of  the  switching  probabilities,  and  demonstrate  the  ability  to  get  close  to  the 
peak  of  the  likelihood  contour  fairly  quickly  even  when  quite  incorrect  initial  guesses 
(represented  by  the  small  square  )  are  made. 
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(Please  See  Reverse  for  Fig.  9  Legend) 


Figure  9.  Predicting  hidden-state  transitions.  Noise  derived  in  the  same  manner  as 
that  of  Fig.  4  was  scaled  to  yield  a  standard  deviation  of  1.0  and  was  added  to  a  10,000 
point  signal  derived  from  the  three-state  Markov  process  outlined  in  the  figure.  As 
seen  in  the  figure,  the  method  did  well  despite  an  extremely  poor  initial  guess.  The 
values  arrived  at  are,  as  well,  those  which  are  arrived  at  when  the  initial  guess  is  the 
generating  values  themselves. 
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